Optical Character Recognition for Hindi Language Using a Neural-network Approach

نویسندگان

  • Divakar Yadav
  • Sonia Sánchez-Cuadrado
  • Jorge Morato
چکیده

Hindi is the most widely spoken language in India, with more than 300 million speakers. As there is no separation between the characters of texts written in Hindi as there is in English, the Optical Character Recognition (OCR) systems developed for the Hindi language carry a very poor recognition rate. In this paper we propose an OCR for printed Hindi text in Devanagari script, using Artificial Neural Network (ANN), which improves its efficiency. One of the major reasons for the poor recognition rate is error in character segmentation. The presence of touching characters in the scanned documents further complicates the segmentation process, creating a major problem when designing an effective character segmentation technique. Preprocessing, character segmentation, feature extraction, and finally, classification and recognition are the major steps which are followed by a general OCR. The preprocessing tasks considered in the paper are conversion of gray scaled images to binary images, image rectification, and segmentation of the document ́s textual contents into paragraphs, lines, words, and then at the level of basic symbols. The basic symbols, obtained as the fundamental unit from the segmentation process, are recognized by the neural classifier. In this work, three feature extraction techniques-: histogram of projection based on mean distance, histogram of projection based on pixel value, and vertical zero crossing, have been used to improve the rate of recognition. These feature extraction techniques are powerful enough to extract features of even distorted characters/symbols. For development of the neural classifier, a back-propagation neural network with two hidden layers is used. The classifier is trained and tested for printed Hindi texts. A performance of approximately 90% correct recognition rate is achieved. Keywords—OCR, Pre-processing, Segmentation, Feature Vector, Classification, Artificial Neural Network (ANN)

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hindi Numeral Recognition using Neural Network

Handwriting has continued to persist as a means of communication and recording information in day-to-day life even with the introduction of new technologies. The constant development of computer tools lead to the requirement of easier interface between the man and the computer. Handwritten character recognition may for instance be applied to Zip-Code recognition, automatic printed form acquisit...

متن کامل

Multi Lingual Character Recognition Using Hierarchical Rule Based Classification and Artificial Neural Network

Optical Character Recognition is one of the rapidly growing areas of Artificial Intelligence due to its vast applicability. The technique is used to recognize characters printed on paper or elsewhere. The optical character recognition gains more importance when there are multiple languages present. The complexity of the problem increases for the addition of every language. The identification of...

متن کامل

Handwritten Character Recognition Using Classifier

Character recognition comes into play when we want to recognize the hand written characters in a perticular natural language. It can be done using various types of method and algorithm that are already defined.Character recognition is essentially a pattern recognition problem and has been around for years now. Although there are implementation of hand written characters in many natural language...

متن کامل

Recognition of Handwritten Hindi Characters using Backpropagation Neural Network

Automatic recognition of handwritten characters is a difficult task because characters are written in various curved & cursive ways, so they could be of different sizes, orientation, thickness, format and dimension. An offline handwritten Hindi character recognition system using neural network is presented in this paper. Neural networks are good at recognizing handwritten characters as these ne...

متن کامل

Recognition of Handwritten Devanagari Script Using Soft Computing

Development of a Character recognition system for Devnagri is difficult because (i) there are about 350 basic, modified (“matra”) and compound character shapes in the script and (ii) the characters in a words are topologically connected. Here focus is on the recognition of offline handwritten Hindi characters that can be used in common applications like bank cheques, commercial forms, governmen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JIPS

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2013